Searching for gravitational waves from binary coalescence 
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We describe the implementation of a search for gravitational waves from compact binary coales- 
cences in LIGO and Virgo data. This all-sky, all-time, multi-detector search for binary coalescence 
has been used to search data taken in recent LIGO and Virgo runs. The search is built around 
a matched filter analysis of the data, augmented by numerous signal consistency tests designed to 
distinguish artifacts of non-Gaussian detector noise from potential detections. We demonstrate the 
search performance using Gaussian noise and data from the fifth LIGO science run and demon- 
strate that the signal consistency tests are capable of mitigating the effect of non-Gaussian noise 
and providing a sensitivity comparable to that achieved in Gaussian noise. 



I. INTRODUCTION 

Coalescing binaries of compact objects such as neutron 
stars (NSs) and stellar-mass black holes (BHs) are 
promising gravitational-wave (GW) sources for ground- 
based, kilometer-scale interferometric detectors such as 
LIGO PQ, Virgo 0, and GEO600 [3], which are sensi- 
tive to waves of frequencies between tens and thousands 
of Hertz. Numerous searches for these signals were per- 
formed on data from the six LIGO and GEO science runs 
(S1-S6) and from the four Virgo science runs (VSR1-4) 

BHH]. 

Over time, the software developed to run these searches 
and evaluate the significance of results evolved into a 
sophisticated pipeline, known as ihope. An early version 
of the pipeline was described in |15j . In this paper, we 
describe the ihope pipeline in detail and we characterize 
its detection performance by comparing the analysis of 
a month of real data with the analysis of an equivalent 
length of simulated data with Gaussian stationary noise. 

Compact binary coalescences (CBCs) consist of three 
dynamical phases: a gradual inspiral, which is described 
accurately by the post-Newtonian approximation to the 
Einstein equations |16j : a nonlinear merger, which can 
be modeled with numerical simulations (see [PTHTTj] for 



recent reviews); and the final ringdown of the merged 
object to a quiescent state [20]. For the lighter NS-NS 
systems, only the inspiral lies within the band of detector 
sensitivity. Since CBC waveforms are well modeled, it is 
natural to search for them by matched-filtering the data 
with banks of theoretical template waveforms |21j . 

The most general CBC waveform is described by seven- 
teen parameters, which include the masses and intrinsic 
spins of the binary components, as well as the location, 
orientation, and orbital elements of the binary. It is not 
feasible to perform a search by placing templates across 
such a high-dimensional parameter space. However, it is 
astrophysically reasonable to neglect orbital eccentricity 
[32 [53]; furthermore, CBC waveforms that omit the ef- 
fects of spins have been shown to have acceptable phase 
overlaps with spinning-binary waveforms, and are there- 
fore suitable for the purpose of detecting CBCs, if not to 
estimate their parameters accurately [24] . 

Thus, CBC searches so far have relied on nonspinning 
waveforms that are parameterized only by the component 
masses, by the location and orientation of the binary, by 
the initial orbital phase, and by the time of coalescence. 
Among these parameters, the masses determine the in- 
trinsic phasing of the waveforms, while the others affect 
only the relative amplitudes, phases, and timing observed 
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at multiple detector sites [55]. It follows that templates 
need to be placed only across the two-dimensional param- 
eter space spanned by the masses [35] ■ Even so, past CBC 
searches have required many thousands of templates to 
cover their target ranges of masses. (We note that ihope 
could be extended easily to nonprecessing binaries with 
aligned spins. However, more general precessing wave- 
forms would prove more difficult, as discussed in |26H29| .) 

In the context of stationary Gaussian noise, matched- 
filtering would directly yield the most statistically sig- 
nificant detection candidates. In practice, environmental 
and instrumental disturbances cause non-Gaussian noise 
transients (glitches) in the data. Searches must distin- 
guish between the candidates, or triggers, resulting from 
glitches and those resulting from true GWs. The tech- 
niques developed for this challenging task include coin- 
cidence (signals must be observed in two or more de- 
tectors with consistent mass parameters and times of ar- 
rival), signal- consistency tests (which quantify how much 
a signal's amplitude and frequency evolution is consistent 
with theoretical waveforms |30J), and data quality vetoes 
(which identify time periods when the detector glitch rate 
is elevated). We describe these in detail later. 

The statistical significance after the consistency tests 
have been applied is then quantified by computing the 
false alarm probability (FAP) or false alarm rate (FAR) 
of each candidate; we define both below. For this, the 
background of noise-induced candidates is estimated by 
performing time shifts, whereby the coincidence and con- 
sistency tests are run after imposing relative time offsets 
on the data from different detectors. Any consistent can- 
didate found in this way must be due to noise; further- 
more, if the noise of different detectors is uncorrelated, 
the resulting background rate is representative of the rate 
at zero shift. 

The sensitivity of the search to CBC waves is estimated 
by adding simulated signals (injections) to the detector 
data, and verifying which are detected by the pipeline. 
With this diagnostic we can tune the search to a specific 
class of signals (e.g., a region in the mass plane), and 
we can give an astrophysical interpretation, such as an 
upper limit on CBC rates [3T], to completed searches. 

As discussed below, commissioning a GW search with 
the ihope pipeline requires a number of parameter tun- 
ings, which include the handling of coincidences, the 
signal-consistency tests, and the final ranking of triggers. 
To avoid biasing the results, ihope permits a blind anal- 
ysis: the results of the non-time-shifted analysis can be 
sequestered, and tuning performed using only the injec- 
tions and time-shifted results. Later, with the parameter 
tunings frozen, the non-time-shifted results can be un- 
blinded to reveal the candidate GW events. 

This paper is organized as follows. In Sec. [IT] we provide 
a brief overview of the ihope pipeline, and describe its 
first few stages (data conditioning, template placement, 
filtering, coincidence), which would be sufficient to im- 
plement a search in Gaussian noise but not, as we show, 
in real detector data. In Sec. IIHI we describe the various 



techniques that have been developed to eliminate the ma- 
jority of background triggers due to non-Gaussian noise. 
In Sec. |IV| we describe how the ihope results are used 
to make astrophysical statements about the presence or 
absence of signals in the data, and to put constraints on 
CBC event rates. Last, in Sec. [V] we discuss ways in 
which the analysis can be enhanced to improve sensitiv- 
ity, reduce latency, and find use in the advanced-detector 
era. 

Throughout this paper we show representative ihope 
output, taken from a search of one month of LIGO data 
from the S5 run (the third month in [12j). when all three 
LIGO detectors (but not Virgo) were operational. The 
search focused on low-mass CBC signals with component 
masses > 1 M and total mass < 25 M . For compari- 
son, we also run the same search on Gaussian noise gener- 
ated at the design sensitivity of the Laser Interferometer 
Gravitational-wave Observatory (LIGO) detectors (using 
the same data times as the real data). Where we perform 
GW-signal injections (see Sec. IVC]), we adopt a pop- 
ulation of binary-neutron-star inspirals, uniformly dis- 
tributed in distance, coalescence time, sky position and 
orientation angles. 



II. IHOPE, PART 1: SETTING UP A 
MATCHED-FILTERING SEARCH WITH 
MULTIPLE-DETECTOR COINCIDENCE 

The stages of the ihope pipeline are presented 
schematically in Fig. [l] and are described in detail in 
Sees. [Tl]|IV| of this paper. First, the science data to be 
analyzed is identified and split into 2048 s blocks, and the 
power spectral density is estimated for each block (see 
Sec. II A). Next, a template bank is constructe d ind epen- 
dently for each detector and each block (Sec. IIB[ ). The 
data blocks are matched-filtered against each bank tem- 
plate, and the times when the signal-to-noise ratio (SNR) 
rises above a set threshold are recorded as triggers (Sec. 
II C ) . The triggers from each detector are then compared 
to identify coincidences — that is, triggers that occur in 



two or more detectors with similar masses and compati- 
ble times (Sec. |IID[). 



If detector noise was Gaussian and stationary, we could 
proceed directly to the statistical interpretation of the 
triggers. Unfortunately, non-Gaussian noise glitches gen- 
erate both an increase in the number of low-SNR triggers 
as well as high-SNR triggers that form long tails in the 
distribution of SNRs. The increase in low-SNR triggers 
will cause an small, but inevitable, reduction in the sen- 
sitivity of the search. It is, however, vital to distinguish 
the high-SNR background triggers from those caused by 
real GW signals. To achieve this, the coincident trig- 
gers are used to generate a reduced template bank for a 
second round of matched-filtering in each detector (see 
the beginning of Sec. Ill I. This time, signal-consistency 



tests are performed on each trigger to help differentiate 
background from true signals (Sees. Ill A IIIB). These 
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FIG. 1. Structure of the ihope pipeline. 



tests are computationally expensive, so we reserve them 
for this second pass. Single-detector triggers are again 
compared for coincidence, and the final list is clustered 
and ranked (Sec. HIE), taking into account signal con- 
sistency, amplitude consistency among detectors (Sec. 
Ill C ) , as well as the times in which the detectors were not 



cident triggers that have a quasi-Gaussian distribution; 
they can now be evaluated for statistical significance, and 
used to derive event-rate upper limits in the absence of 
a detection. 

To do this, the steps of the search that involve co- 
incidence are repeated many times, artificially shifting 
the time stamps of triggers in different detectors, such 
that no true GW signal would actually be found in co- 
incidence (Sec. IV A). The resulting time-shift triggers 
are used to calculate the FAR of the in-time (zero-shift) 
triggers. Those with FAR lower than some threshold are 
the GW-signal candidates (Sec. IV B I. Simulated GW 
signals are then injected into the data, and by observ- 
ing which injections are recovered as triggers with FAR 
lower than some threshold, we can characterize detection 
efficiency as a function of distance and other parameters 
(Sec. IV C I, providing an astrophysical interpretation for 
the search. Together with the FARs of the loudest trig- 
gers, the efficiency yields the upper limits (Sec. IV D). 



operating optimally (Sec. HID I. These steps leave coin- 



A. Data segmentation and conditioning, 
power-spectral-density generation 

As a first step in the pipeline, ihope identifies the 
stretches of detector data that should be analyzed: for 
each detector, such science segments are those for which 
the detector was locked (i.e., interferometer laser light 
was resonant in Fabry-Perot cavities [T]), no other ex- 
perimental work was being performed, and the detec- 
tor's "science mode" was confirmed by a human "science 
monitor." ihope builds a list of science-segment times 
by querying a network-accessible database that contains 
this information for all detectors. 

The LIGO and Virgo GW-strain data are sampled 
at 16, 384 Hz and 20, 000 Hz, respectively, but both are 
down-sampled to 4096 Hz prior to analysis [TS] , since at 
frequencies above 1 kHz to 2 kHz detector noise over- 
whelms any likely CBC signal. This sampling rate sets 
the Nyquist frequency at 2048 Hz; to prevent aliasing, 
the data are preconditioned with a time-domain digital 
filter with low-pass cutoff at the Nyquist frequency |15j . 
While CBC signals extend to arbitrarily low frequencies, 
detector sensitivity degrades rapidly, so very little GW 
power could be observed below 40 Hz. Therefore, we 
usually suppress signals below 30 Hz with two rounds of 
8th-order Butterworth high-pass filters, and analyze data 
only above 40 Hz. 

Both the low- and high-pass filters corrupt the data 
at the start and end of a science segment, so the first 
and last few seconds of data (typically 8 s) are discarded 
after applying the filters. Furthermore, SNRs are com- 
puted by correlating templates with the (noise- weighted) 
data stream, which is only possible if a stretch of data 
of at least the same length as the template is available. 
Altogether, the data are split into 256 s segments, and 
the first and last 64 s of each segment are not used in the 
search. Neighboring segments are overlapped by 128 s to 
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ensure that all available data are analyzed. 

The strain power spectral density (PSD) is computed 
separately for every 2048 s block of data (consisting of 
15 overlapping 256 s segments). The blocks themselves 
are overlapped by 128 s. The block PSD is estimated by 
taking the median [32] (in each frequency bin) of the seg- 
ment PSDs, ensuring robustness against noise transients 
and GW signals (whether real or simulated). The PSD is 
used in the computation of SNRs, and to set the spacing 
of templates in the banks. Science segments shorter than 
2064 s (2048 s block length and 16 s to account for the 
padding on either side) are not used in the analysis, since 
they cannot provide an accurate PSD estimate. 



B. Template-bank generation 

Template banks must be sufficiently dense in param- 
eter space to ensure a minimal loss of matched-filtering 
SNR for any CBC signal within the mass range of in- 
terest; however, the computational cost of a search is 
proportional to the number of templates in a bank. The 
method used to place templates must balance these con- 
siderations. This problem is well explored for nonspin- 
ning CBC signals 33 39], for which templates need only 
be placed across the two-dimensional intrinsic-parameter 
space spanned by the two component masses. The other 
extrinsic parameters enter only as amplitude scalings or 
phase offsets, and the SNR can be maximized analytically 
over these parameters after filtering by each template. 

Templates are placed in parameter space so that the 
match between any GW signal and the best-fitting tem- 
plate is better than a minimum match MM (typically 
97%). The match between signals h with parameter vec- 
tors £j and £ 2 is defined as 



where 
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where t c and <f> c are the time and phase of coalescence 
of the signal, ( • | • ) is the standard noise-weighted inner 
product 
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with S n (f) the one-sided detector-noise PSD. The MM 
represents the worst-case reduction in matched-filtering 
SNR, and correspondingly the worst-case reduction in the 
maximum detection distance of a search. Thus, under the 
assumption of sources uniformly distributed in volume, 
the loss in sensitivity due to template-bank discreteness 
is bounded by MM 3 , or ~ 10% for MM = 97%. 

It is computationally expensive to obtain template mis- 
matches for pairs of templates using Eq. (§, so an ap- 
proximation based on a parameter-space metric is used 
instead: 
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The approximation holds as long as the metric is roughly 
constant between bank templates, and is helped by 
choosing parameters (i.e., coordinates £) that make the 
metric almost flat, such as the "chirp times" To, T3 given 
by go] 



to 



T3 



2567r/i ow ?7 



nGMf] 



low 



-5/3 



8/iow?7 



7rGM/i ow 



-2/3 



(5) 
(6) 



Here M is the total mass, w is the symmetric mass ratio 
i] = mim2/M 2 and /i ow is the lower frequency cutoff 
used in the template generation. 

For the S5-S6 and VSR1-3 CBC searches, templates 
were placed on a regular hexagonal lattice in tq-t^ space 
[33], sized so that MM would be 97% [ITJ E3 QJ]. The 
metric was computed using inspiral waveforms at the 
second post-Newtonian (2PN) order in phase. Higher- 
order templates are now used in searches (some including 
merger and ringdown), but not for template placement; 
work is ongoing to implement that. Figure [2] shows a 
typical template bank in both m\—m<i and tq-t^ space 
for the low-mass CBC search. For a typical data block, 
the bank contains around 6000 templates (Virgo, which 
has a a flatter noise PSD, requires more). 

As Eqs. Q and ^ imply, the metric depends on both 
the detector-noise PSD and the frequency limits /i ow and 
/high- We set /iow to 40 Hz, while /high is chosen natu- 
rally as the frequency at which waveforms end (200 Hz 
and 2 kHz for the highest- and lowest-mass signals, re- 
spectively). The PSD changes between data blocks, but 
usually only slightly, so template banks stay roughly con- 
stant over time in a data set. 



C. Matched filtering 

The central stage of the pipeline is the matched fil- 
tering of detector data with bank templates, resulting in 
a list of triggers that are further analyzed downstream. 
This stage was described in detail in Ref. [32] : here we 
sketch its key features. 

The waveform from a non-spinning CBC, as observed 
by a ground-based detector and neglecting higher-order 
amplitude corrections, can be written as 

h{r) = Vr)cos$ + ^/ 2 (T)sin$ , (7) 
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FIG. 2. A typical template bank for a low-mass CBC inspiral 
search, as plotted in mi-mj space (top panel) and tq-tz space 
(bottom panel). Templates are distributed more evenly over 
ro and T3, since the parameter-space metric is approximately 
flat in those coordinates. 



Here, r is a time variable relative to the coalescence time, 
t c . The constant amplitude A and phase <£>0j between 
them, depend on all the binary parameters: masses, sky 
location and distance, orientation, and (nominal) orbital 
phase at coalescence. By contrast, the time-dependent 
frequency f(r) and phase $(t) depend only on the com- 
ponent masses [j] and on the absolute time of coalescence. 

The squared SNR p 2 for the data s and template h, 
analytically maximized over A and $0j is given by 



(s|/i ) 2 + {s\K /2 ) 
(h \ho) 



(9) 



1 Strictly, the waveforms depend upon the red-shifted component 
masses (1 + z)mi 2. Note, however, that this does not affect the 
search as one can simply replace the masses by their redshifted 
values. 



here we assume that h^/^if) = iho(f), which is identi- 
cally true for waveforms defined in the frequency domain 
with the stationary-phase approximation |41j . and ap- 
proximately true for all slowly evolving CBC waveforms. 

The maximized statistic p 2 of Eq. ^ is a function only 
of the component masses and the time of coalescence t c . 
Now, a time shift can be folded in the computation of 
inner products by noting that g(r) = h(r — At c ) trans- 
forms to <?(/) = e l27rfAt "h(f); therefore, the SNR can 
be computed as a function of t c by the inverse Fourier 
transform (a complex quantity) 



(s\h)(At c ) = 4 



/high 5 



g( /M/) .wAt. 

SnU) 



df. (10) 



if K12U) = iho(f) then Eq. (10), corn- 



Furthermore 

puted for h — h , yields (s\h )(At c ) + i {s\h^/2)[/S.t c ). 

The ihope matched-filtering engine implements the 
discrete analogs of Eqs. ^ and (10 1 [25] using the ef- 
ficient FFTW library [42]~The resulting SNRs are not 
stored for every template and every possible t c ; instead, 
we only retain triggers that exceed an empirically deter- 
mined threshold (typically 5.5), and that corresponds to 
maxima of the SNR time series — that is, a trigger above 
the threshold is kept only if there are no triggers with 
higher SNR within a predefined time window, typically 
set to the length of the template (this is referred to as 
time clustering). 

For a single template and time and for detector data 
consisting of Gaussian noise, p 2 follows a \ 2 distribution 
with two degrees of freedom, which makes a threshold of 
5.5 seem rather large: p(p > 5.5) = 2.7 x 10~ 7 . However, 
we must account for the fact that we consider a full tem- 
plate bank and maximize over time of coalescence: the 
bank makes for, conservatively, a thousand independent 
trials at any point in time, while trials separated by 0.1 
seconds in time are essentially independent. Therefore, 
we expect to see a few triggers above this threshold al- 
ready in a few hundred seconds of Gaussian noise, and a 
large number in a year of observing time. Furthermore, 
since the data contain many non-Gaussian noise tran- 
sients, the trigger rate will be even higher. In Fig. [3] we 
show the distribution of triggers as a function of SNR in 
a month of simulated Gaussian noise (blue) and real data 
(red) from LIGO's fifth science run (S5). The difference 
between the two is clearly noticeable, with a tail of high 
SNR triggers extending to SNRs well over 1000 in real 
data. 

It is useful to not just cluster in time, but also across 
the template bank. When the SNR for a template is 
above threshold, it is probable that it will be above 
threshold also for many neighboring templates, which en- 
code very similar waveforms. The ihope pipeline selects 
only one (or a few) triggers for each event (be it a GW or 
a noise transient), using one of two algorithms. In time- 
window clustering, the time series of triggers from all 
templates is split into windows of fixed duration; within 
each window, only the trigger with the largest SNR is 



6 




10.0 



100.0 



1000.0 




0.1 0.2 0.3 0. 

average metric distance 



0.5 



FIG. 3. Distribution of single detector trigger SNRs in a 
month of simulated Gaussian noise (blue) and real S5 LIGO 
data (red) from the Hanford interferometer HI. 



FIG. 4. Distribution of average parameter-space distance 
between coincident triggers associated with simulated GW 
signals in a month of representative S5 data, as recovered by 
the LIGO HI and LI detectors. 



kept. This method has the advantage of simplicity, and 
it guarantees an upper limit on the trigger rate. However, 
a glitch that creates triggers in one region of parameter 
space can mask a true signal that creates triggers else- 
where. This problem is remedied in TrigScan cluster- 
ing 03], whereby triggers are grouped by both time and 
recovered (template) masses, using the parameter-space 
metric to define their proximity (for a detailed descrip- 
tion see 05). However, when the data are particularly 
glitchy TrigScan can output a number of triggers that 
can overwhelm subsequent data processing such as coin- 
cident trigger finding. 



D. Multi-detector coincidence 

The next stage of the pipeline compares the triggers 
generated for each of the detectors, and retains only those 
that are seen in coincidence. Loosely speaking, triggers 
are considered coincident if they occurred at roughly the 
same time, with similar masses; see Ref. 0S] for an exact 
definition of coincidence as used in recent CBC searches. 
To wit, the "distance" between triggers is measured with 
the parameter-space metric of Eq. (4j), maximized over 
the signal phase &o ■ Since different detectors at different 
times have different noise PSDs and therefore metrics, 
we construct a constant-metric-radius ellipsoid in Tq—t^- 
t c space, using the appropriate metric for every trigger 
in every detector, and we deem pairs of triggers to be 
coincident if their ellipsoids intersect. The radius of the 
ellipsoids is a tunable parameter. Computationally, the 
operation of finding all coincidences is vastly sped up by 
noticing that only triggers that are close in time could 
possibly have intersecting ellipsoids; therefore the trig- 
gers are first sorted by time, and only those that share a 
small time window are compared. 



When the detectors are not co-located, the coincidence 
test must also take into account the light travel time be- 
tween detectors. This is done by computing the metric 
distance while iteratively adding a small value, St c to the 
end time of one of the detectors. 8t c varies over the pos- 
sible range of time delays due to light travel time between 
the two detectors. The lowest value of the metric distance 
is then used to determine if the triggers are coincident or 
not. 

In Fig. [4] we show the distribution of metric distances 
(the minimum value for which the ellipsoids centred on 
the triggers overlap) for coincident trigg ers associated 
with simulated GW signals (see Sec IV C). The number 
of coincidences falls off rapidly with increasing metric 
distances, whereas it would remain approximately con- 
stant for background coincident triggers generated by 
noise. However, it is the quieter triggers from farther 
GW sources (which are statistically more likely) that are 
recovered with the largest metric distances. Therefore 
larger coincidence ellipsoids can improve the overall sen- 
sitivity of a search. 

The result of the coincidence process is a list of all trig- 
gers that have SNR above threshold in two or more detec- 
tors and consistent parameters (masses and coalescence 
times) across detectors. When more than two detec- 
tors are operational, different combinations and higher- 
multiplicity coincidences are possible (e.g., three detec- 
tors yield triple coincidences and three types of double 
coincidences). 

In Fig. [5] we show the distribution of coincident HI 
triggers as a function of SNR in a month of simulated 
Gaussian noise (blue) and real S5 LIGO data (red). 
The largest single-detector SNRs for Gaussian noise are 
~ 7-8, comparable (although somewhat larger) with 
early theoretical expectations 0SJU7]- However, the dis- 
tribution in real data is significantly worse, with SNRs 
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FIG. 5. Distribution of single detector SNRs for HI coincident 
triggers in a month of simulated Gaussian noise (blue) and 
representative S5 data (red). Coincidence was evaluated after 
time-shifting the SNR time series, so that only background 
coincidences caused by noise would be included. Comparison 
with Fig. [3j shows that the coincidence requirement reduces 
the high-SNR tail, but by no means eliminates it. 



of hundreds and even thousands. If we were to end our 
analysis here, a GW search in real data would be a hun- 
dred times less sensitive (in distance) than a search in 
Gaussian, stationary noise with the same PSD. 



III. IHOPE, PART 2: MITIGATING THE 
EFFECTS OF NON-GAUSSIAN NOISE WITH 
SIGNAL-CONSISTENCY TESTS, VETOES, AND 
RANKING STATISTICS 

To further reduce the tail of high-SNR triggers caused 
by the non-Gaussianity and nonstationarity of noise, the 
ihope pipeline includes a number of signal- consistency 
tests, which compare the properties of the data around 
the time of a trigger with those expected for a real GW 
signal. After removing duplicates, the coincident trig- 
gers in each 2048 s block are used to create a triggered 
template bank. Any template in a given detector that 
forms at least one coincident trigger in each 2048 s block 
will enter the triggered template bank for that detec- 
tor and chunk. The new bank is again used to filter 



the data as described in Sec. |IIC[ but this time signal- 
consistc ncy te sts are also p erform ed. These include the 



X (Sec. Ill A I and r 2 (Sec. Ill B I tests. Coincident trig- 



gers are selected as described in Sec. |IID| and they are 
also tested for the consistency of relative signal ampli- 



tudes (Sec. IIIC); at this stage, data-quality vetoes are 
applied (Sec. HID) to sort triggers into categories ac- 
cording to the quality of data at their times. 

The computational cost of the entire pipeline is 
reduced greatly by applying the expensive signal- 
consistency checks only in this second stage; the trig- 



gered template bank is, on average, a factor of ~ 10 
smaller than the original template bank in the analysis 
described in [T2]. However, the drawback is greater com- 
plexity of the analysis, and the fact that the coincident 
triggers found at the end of the two stages may not be 
identical. 



A. The x signal-consistency test 

The basis of the x 2 test [3D] is the consideration that 
although a detector glitch may generate triggers with the 
same SNR as a GW signal, the manner in which the SNR 
is accumulated over time and frequency is likely to be 
different. For example, a glitch that resembles a delta 
function corresponds to a burst of signal power concen- 
trated in a small time-domain window, but smeared out 
across all frequencies. A CBC waveform, on the other 
hand, will accumulate SNR across the duration of the 
template, consistently with the chirp-\ike morphology of 
the waveform. 

To test whether this is the case, the template is broken 
into p orthogonal subtemplates with support in adjacent 
frequency intervals, in such a way that each subtemplate 
would generate the same SNR on average over Gaus- 
sian noise realizations. The actual SNR achieved by each 
subtemplate filtered against the data is compared to its 
expected value, and the squared residuals are summed. 
Thus, the x 2 test requires p inverse Fourier transforms 
per template. For the low-mass CBC search, we found 
that setting p = 16 provides a powerful discriminator 
without incurring an excessive computational cost |48j . 

For a GW signal that matches the template waveform 
exactly, the sum of squared residuals follows the x 2 dis- 
tribution with 2p — 2 degrees of freedom. For a glitch, or 
a signal that does not match the template, the expected 
value of the x 2 -test is increased by a factor proportional 
to the total SNR 2 , with a proportionality constant that 
depends on the mismatch between the signal and the 
template. For signals, we may write the expected x 2 
value as 



( X 2 H(2p-2) + e y 



(11) 



where e is a measure of signal-template mismatch. Even 
if CBC signals do not match template waveforms per- 
fectly, due to template-bank discreteness, theoretical 
waveform inaccuracies [3FJ, spin effects [23], calibration 
uncertainties [50], and so on, they will still yield signifi- 
cantly smaller x 2 than most glitches. It was found empir- 
ically that a good fraction of glitches are removed (with 
minimal effect on simulated signals) by imposing a SNR- 
dependent x 2 threshold of the form 



x 2 <e(p+s P 2 ), 



(12) 



with £ 2 = 10 and 5 = 0.2. 

In Fig. [6] we show the distribution of x 2 as a function 
of SNR. A large number of triggers would have appeared 
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FIG. 6. The \ 2 test plotted against SNR for triggers in a 
month of representative S5 data after the \ 2 test has been ap- 
plied, and the r 2 cut has been applied for triggers with p < 12. 
The blue crosses mark time shifted background triggers, the 
red pluses mark simulated-GW triggers. The solid, colored 
lines on the plots indicate lines of constant effective SNR (top 
panel) and new SNR (bottom panel), which are described in 
section [ill E| Larger values of effective/new SNR are at the 
bottom and right end of the plots. The clearly visible notch 
in the HI and LI plots is cau sed by the discontinuity in the r 2 
cut at an SNR of 12 (Section MB I. Here background triggers 
are represented by blue crosses and injections by red pluses. 



in the upper left corner of the plot (large x 2 value rela- 
tive to the measured SNR) , but these have been removed 
by the cut. Even following the cut, a clear separation 
between noise background and simulated signals can eas- 
ily be observed. This will be used later in formulating 
a detection statistic that combines the values of both p 
and x 2 - 



FIG. 7. Value of SNR and x 2 as a function of time, for a 
simulated CBC signal with SNR=300 in a stretch of S5 data 
from the HI detector. The SNR shows a characteristic rise 
and fall around the signal. The x 2 value is small at the time of 
the signal, but increases steeply to either side as the template 
waveform is offset from the signal in the data. 



B. The r 2 signal-consistency test 

We can also test the consistency of the data with a 
postulated signal by examining the time series of SNRs 
and x 2 s. For a true GW signal, this would show a single 
sharp peak at the time of the signal, with the width of the 
falloff determined by the autocorrelation function of the 
template [FTJ [S5]. Thus, counting the number of time 
samples around a trigger for which the SNR is above 
a set threshold provides a useful consistency test |53j . 
Examining the behavior of the x 2 time series provides a 
more powerful diagnostic [S3]. To wit, the r 2 test sets an 
upper threshold on the amount of time AT (in a window 
T prior to the triggeiQ for which 



X 



> 



pr 



(13) 



where p is the number of subtemplates used to compute 
the x 2 ■ We found empirically that setting T = 6 s and 
r 2 = 15 produces a powerful test [S3]. Figure [7] shows 
the characteristic shape of the x 2 time series for CBC 
signals: close to zero when the template is aligned with 
the signal, then increasing as the two are offset in time, 
before falling off again with larger time offsets. 

An effective AT threshold must be a function of SNR; 
the AT commonly used for ihope searches is 



AT < 



2 x 10" 4 s 



for p < 12, 
p 9/8 x 7.5 x 1CT 3 s for p > 12. 



(14) 



2 The nonsymmetric window was chosen because the merger— 
ringdown phase of CBC signals, which is not modeled in inspiral- 
only searches, may cause an elevation in the \ 2 time series after 
the trigger. 
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FIG. 8. The x time above pr as a function of SNR, for 
all second-stage HI triggers in a month of representative S5 
data. The r 2 test has already been applied on triggers with 
p < 12, and only those surviving the cut are shown. The blue 
crosses mark all background triggers (with p > 12) that fail 
the cut; blue circles indicate background triggers that pass it. 
Red circles mark simulated-GW triggers, none of which are 
cut. 



The threshold for p < 12 eliminates triggers for which 
any sample is above the threshold from equation (13 1. 



In Fig. [8] we show the effect of such an SNR test. For 
p < 12, the value of AT is smaller than the sample rate, 
therefore triggers are discarded if there are any time sam- 
ples in the 6 s prior to the trigger for which Eq. ( 13 1 is 



satisfied. (Since the 6 s window includes the trigger, for 
some SNRs this imposes a more stringent requirement 
than the \ 2 test (12 1, explaining the notch at p < 12 
and relatively large x 2 values in Fig. |6j) For p > 12, 
the threshold is SNR dependent. The r 2 test is power- 
ful at removing a large number of high-SNR background 
triggers (the blue crosses), without affecting the triggers 
produced by simulated GW signals (the red circles) . The 
cut is chosen to be conservative to allow for any imperfect 
matching between CBC signals and template waveforms. 



C. Amplitude-consistency tests 

The two LIGO Hanford detectors HI and H2 share 
the same vacuum tubes, and therefore expose the same 
sensitive axes to any incoming GW. Thus, the ratio of the 
HI and H2 SNRs for true GW signals should equal the 
ratio of detector sensitivities. We can formulate a formal 
test of H1-H2 amplitude consistence^ in terms of a GW 



3 The detector H2 was not operational during LIGO run S6, so the 
HI— H2 amplitude-consistency tests were not applied; they were 
however used in searches over data from previous runs. 



FIG. 9. Distribution of k [Eq. ( |15[ )], the fractional difference 
in the effective distances measured by HI and H2 for coinci- 
dent triggers in those detectors in a month of representative 
S5 data. Background triggers (blue) tend to have larger k 
than simulated-GW triggers (red). 



source's effective distance D b qa — the distance at which 
an optimally located and oriented source would give the 
SNR observed with detector A. Namely, we require that 



„ |A:ff,Hl — -Dcff,H2| . * 

k = 2 < k ; 

-Doff, HI + -Deff ,H2 



(15) 



setting a threshold k* provides discrimination against 
noise triggers while allowing for some measurement un- 
certainty. In Fig. [9] we show the distribution of k for 
simulated-GW triggers and background triggers in a 
month of representative S5 data. We found empirically 
that setting k* = 0.6 produces a powerful test. 

An amplitude-consistency test can be defined also for 
triggers that are seen in only one of HI and H2. We do 
this by removing any triggers from HI which are loud 
enough that we would have expected to observe a trigger 
in H2 (and vice- versa). We proceed by calculating a a, 
the distance at which an optimally located and oriented 
source yields an SNR of 1 in detector A, and noting that 
D e ft, a = ca/pa- Then, by rearranging (151, we are led 



to require that a trigger that is seen only in HI satisfy 



pm < 



CH2 V 2 _ K * 



(16) 



where is the SNR threshold used for H2. The ef- 
fective distance cut removes essentially all H2 triggers 
for which there is no HI coincidence: since H2 typically 
had around half the distance sensitivity of HI, a value of 
k* = 0.6 imposes pn2 < Phi- 

Neither test was used between any other pair of detec- 
tors because, in principle, any ratio of effective distances 
is possible for a real signal seen in two nonaligned de- 
tectors. However, large values of n are rather unlikely, 
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especially for the Hanford and Livingston LIGO detec- 
tors, which are almost aligned. Therefore amplitude- 
consistency tests should still be applicable. 



D. Data-quality vetoes 

Environmental factors can cause periods of elevated 
detector glitch rate. In the very worst (but very rare) 
cases, this makes the data essentially unusable. More 
commonly, if these glitchy periods were analyzed together 
with periods of relatively clean data, they could produce 
a large number of high-SNR triggers, and possibly mask 
GW candidates in clean data. It is therefore necessary 
to remove or separate the glitchy periods. 

This is accomplished using data quality (DQ) flags [5ST - 
157] . All detectors are equipped with environmental and 
instrumental monitors; their output is recorded in the 
detector's auxiliary channels. Periods of heightened ac- 
tivity in these channels (e.g., as caused by elevated seis- 
mic noise [53]) are automatically marked with DQ flags 
|59j . DQ flags can also be added manually if the detector 
operators observe poor instrumental behavior. 

If a DQ flag is found to be strongly correlated with 
CBC triggers, and if the flag is safe (i.e., not triggered 
by real GWs), then it can be used a DQ veto. Veto 
safety is assessed by comparing the fraction of hardware 
GW injections that are vetoed with the total fraction of 
data that is vetoed. During the S6 and VSR2-3 runs, 
a simplified form of ihope was run daily on the pre- 
ceding 24 hours of data from each detector individually, 
specifically looking for non-Gaussian features that could 
be correlated with instrumental or environmental effects 
[551 fHO] . The results of these daily runs were used to help 
identify common glitch mechanisms and to mitigate the 
effects of non-Gaussian noise by suggesting data quality 
vetoes. 

Vetoes are assigned to categories based on the severity 
of instrumental problems and on how well the couplings 
between the GW and auxiliary channels are understood 
[55H57] . Correspondingly, CBC searches assign data to 
four DQ categories: 

Category 1: Seriously compromised or missing data. 
The data are entirely unusable, to the extent that 
they would corrupt noise PSD estimates. These 
times are excluded from the analysis, as if the de- 
tector was not in science mode (introduced in Sec. 
HA}. 



Category 2: Instrumental problems with known cou- 
plings to the GW channel. Although the data are 
compromised, these times can still be used for PSD 
estimation. Data flagged as category-2 are ana- 
lyzed in the pipeline, but any triggers occurring 
during these times are discarded. This reduces the 
fragmentation of science segments, maximizing the 
amount of data that can be analyzed. 



Category 3: Likely instrumental problems, casting 
doubt on triggers found during these times. Data 
flagged as category-3 are analyzed and triggers 
are processed. However, the excess noise in such 
times may obscure signals in clean data. Conse- 
quently, the analysis is also performed excluding 
time flagged as category-3, allowing weaker signals 
in clean data to be extracted. These data are ex- 
cluded from the estimation of upper limits on GW- 
event rates. 

Good data: Data without any active environmental or 
instrumental source of noise transients. These data 
are analyzed in full. 

Poor quality data are effectively removed from the 
analysis, reducing the total amount of analyzed time. For 
instance, in the third month of the S5 analysis reported 
in Ref. [12] . removing category- 1 times left 1.2 x 10 6 s of 
data when at least two detectors were operational; re- 
moving category-2 and -3 times left 1.0 x 10 6 s, although 
the majority of lost time was category-3, and was there- 
fore analyzed for loud signals. 



E. Ranking statistics 

The application of signal-consistency and amplitude- 
consistency tests, as well as data-quality vetoes, is very 
effective in reducing the non-Gaussian tail of high-SNR 
triggers. In Fig. [lO] we show the distribution of HI trig- 
gers that are coincident with triggers in the LI detector 
(in time shifts) and that pass all cuts. For consistency, 
identical cuts have been applied to the simulated, Gaus- 
sian data, including vetoing times of poor data quality 
in the real data. The majority of these have minimal 
impact, although the data quality vetoes will remove a 
(random) fraction of the triggers arising in the simulated 
data analysis. 

Remarkably, in the real data, almost no triggers are 
left that have SNR > 10. Nevertheless, a small num- 
ber of coincident noise triggers with large SNR remain. 
These triggers have passed all cuts, but they generally 
have significantly worse \ 2 values than expected for true 
signals, as we showed in Fig. [6j 

It is therefore useful to rank triggers using a combina- 
tion of SNR and x 2 ; by introducing a re-weighted SNR. 
Over the course of the LIGO- Virgo analyses, several dis- 
tinct re-weighted SNRs have been used. For the LIGO 
S5 run and Virgo's first science run (VSR1), we adopted 
the effective SNR p e s, defined as [TT] 



Poff 



P 



yte) ( 



i 



250 J 



(17) 



where n^oi = 2p — 2 is the number of x 2 degrees of free- 
dom, and the factor 250 was tuned empirically to pro- 
vide separation between background triggers and simu- 
lated GW signals. The normalization of p e g ensures that 
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FIG. 10. Distribution of single detector SNRs for HI triggers 
found in coincidence with LI triggers (in time shifts) in a 
month of simulated Gaussian noise (blue) and representative 
S5 data (red). These triggers have survived x 2 , t 2 , an d HI— 
H2 amplitude-consistency tests, as well as DQ vetoes. 



™dof 



will have 



a "quiet" signal with p ~ 8 and % 
Pctt — P- 

Figure [6] shows contours of constant p c g in the p~x 2 
plane. While p e g successfully separates background trig- 
gers from simulated-GW triggers, it can artificially ele- 
vate the SNR of triggers with unusually small x 2 - As 
discussed in Ref. [51], these can sometimes become the 
most significant triggers in a search. Thus, a different 
statistic was adopted for the LIGO S6 run and Virgo's 
second and third science runs (VSR23). This new SNR 



Pu 



was defined as 



k i 



\"dof / 



-1/6 



for x 2 < n do f, 



for x 2 > n doi . 



Figure [6] also shows contours of constant p„ 



(18) 
in the p— 



X plane. The new SNR was found to provide even bet- 
ter background-signal separation, especially for low-mass 
nonspinning inspirals [14] . and it has the desirable fea- 
ture that Pnew does not take larger values than p when 
the x 2 is l ess than the expected value. Other ways of 
defining a detection statistic as a function of p and x 2 
can be defined and optimized for analyses covering dif- 
ferent regions of parameter space and different data sets. 

For coincident triggers, the re-weighted SNRs mea- 
sured in the coincident detectors are added in quadra- 
ture to give a combined, re-weighted SNR, which is used 
to rank the triggers and evaluate their statistical signif- 
icance. Using this ranking statistic, we find that the 
distribution of background triggers in real data is re- 
markably close to their distribution in simulated Gaus- 
sian noise. Thus, our consistency tests and DQ vetoes 
have successfully eliminated the vast majority of high 
SNR triggers due to non-Gaussian noise from the search. 



FIG. 11. Distribution of single detector new SNR, p ncw , 
for HI triggers found in coincidence with LI triggers (in time 
shifts) in a month of simulated Gaussian noise (blue) and 
representative S5 data (red). The tail of high SNR triggers 
due to non-Gaussian noise has been virtually eliminated — 
a remarkable achievement given that the first stage of the 
pipeline generated single-detector triggers with SNR > 1, 000. 



While this comes at the inevitable cost of missing poten- 
tial detections at times of poor data quality, it signifi- 
cantly improves the detection capability of a search. 



IV. INTERPRETATION OF THE RESULTS 

At the end of the data processing described above, 
the ihope pipeline produces a set of coincident triggers 
ranked by their combined re-weighted SNR; these trig- 
gers have passed the various signal-consistency and data- 
quality tests outlined above. While at this stage the 
majority of loud background triggers identified in real 
data have been eliminated or downweighted, the distri- 
bution of triggers is still different from the case of Gaus- 
sian noise, and it depends on the quality of the detec- 
tor data and the signal parameter space being searched 
over. Therefore it is not possible to derive an analytical 
mapping from combined re- weighted SNR to event signif- 
icance, as characterized by the FAR. Instead, the FAR is 
evaluated empirically by performing numerous time-shift 
analyses, in which artificial time shifts are introduced be- 
tween the data from different detectors. (These are dis- 
cussed in Sec. IV A[ ) Furthermore, the rate of triggers as 
a function of combined re-weighted SNR varies over pa- 
rameter space; to improve the FAR accuracy, we divide 
triggers into groups with similar combined re-weighted 
SNR distributions (see Sec. IV B I. The sensitivity of a 
search is evaluated by measuring the rate of recovery 
of a large number of simulated signals, with parameters 
drawn from astrophysically motivated distributions (see 
Sec. IV C). The sensitivity is then used to estimate the 



CBC event rates or upper limits as a function of signal 
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parameters (see Sec. IV D I 



A. Background event rate from time shifts 

The rate of coincident triggers as a function of com- 
bined re-weighted SNR is estimated by performing nu- 
merous time-shift analyses: in each we artificially intro- 
duce different relative time shifts in the data from each 
detector The time shifts that are introduced must 
be large enough such that each time-shift analysis is sta- 
tistically independent. 

To perform the time-shift analysis in practice, we sim- 
ply shift the triggers generated at the first matched- 
filtering stage of the analysis ( II C I , and repeat all sub- 



sequent stages from multi-detector coincidence ( II D ) on- 
wards. Shifts are performed on a ring: for each time- 
coincidence period (i.e., data segment where a certain 
set of detectors is operational) , triggers that are shifted 
past the end are re-inserted at the beginning. Since the 
time-coincidence periods are determined before applying 
Category-2 and -3 DQ flags, there is some variation in 
analyzed time among time-shift analyses. To ensure sta- 
tistical independence, time shifts are performed in multi- 
ples of 5 s; this ensures that they are significantly larger 
than the light travel time between the detectors, the au- 
tocorrelation time of the templates, and the duration of 
most non-transient glitches seen in the data. Therefore, 
any coincidences seen in the time shifts cannot be due 
to a single GW source, and are most likely due to noise- 
background triggers. It is possible, however, for a GW- 
induced trigger in one detector to arise in time-shift coin- 
cidence with noise in another detector. Indeed, this issue 
arose in Ref. [Hj, where a "blind injection" was added to 
the data to test the analysis procedure. 

The HI and H2 detectors share the Hanford beam 
tubes and are affected by the same environmental distur- 
bances; furthermore, noise transients in the two detectors 
have been observed to be correlated. Thus, time-shift 
analysis is ineffective at estimating the coincident back- 
ground between these co-located detectors, and it is not 
used. Coincident triggers from HI and H2 when no other 
detectors are operational are excluded from the analysis. 
When detectors at additional sites are operational, we 
do perform time shifts, keeping HI and H2 "in time" but 
shifting both relative to the other detectors. 

Our normal practice is to begin by performing 100 
time-shift analyses to provide an estimate of the noise 
background. If any coincident in-time triggers are still 
more significant (i.e., have larger combined re- weighted 
SNR) than all the time-shifted triggers, additional time 
shifts are performed to provide an estimate of the FAR. 
A very significant candidate would have a very low FAR, 
and an accurate determination of its FAR requires a large 
number of time slides: in Ref. [14j over a million were per- 
formed. However, there is a limit to the number of statis- 
tically independent time shifts that are possible to per- 
form, as explored in [53] ■ Additionally, as the number of 
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FIG. 12. Fraction of time-shift coincident triggers between 
HI and LI in a month of representative S5 data that have 
combined new SNR greater than or equal to the x-axis value, 
for three chirp-mass bins. The distribution from a month 
of Gaussian noise is also shown for comparison. The tails of 
the distributions become more shallow for larger chirp masses 
M, so triggers with higher M. are more likely to have higher 
SNRs. 



time shifts grows, the computational savings of our two- 
stage search are diminished, because a greater fraction of 
the templates survive to the second filtering stage where 
the computationally costly signal-consistency tests are 
performed (see Sec. Ill A). We are currently investigat- 
ing whether it is computationally feasible to run ihope as 



a single-stage pipeline and compute % an d t for every 
trigger. 



B. Calculation of false-alarm rates 

The FAR for a coincident trigger is given by the rate 
at which background triggers with the same or greater 
SNR occur due to detector noise. This rate is computed 
from the time-shift analyses; for a fixed combined re- 
weighted SNR, it varies across the template mass space, 
and it depends on which detectors were operational and 
how glitchy they were. To accurately account for this, 
coincident triggers are split into categories, and FARs 
are calculated within each, relative to a background of 
comparable triggers. The triggers from each category are 
then re-combined into a single list and ranked by their 
FARs. 

Typically, signal-consistency tests are more power- 
ful for longer- duration templates than for shorter ones, 
so the non-Gaussian background is suppressed better 
for low-mass templates, while high-mass templates are 
more likely to result in triggers with larger combined re- 
weighted SNRs. In recent searches, triggers have been 
separated into three bins in chirp mass M |32j : M. < 
3.48 M e , 3.48 M G < M < 7.4 M Q , and M > 7.4 M . 
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Figure 12 shows the distribution of coincident triggers 
between HI and LI as a function of combined p n cw for 
the triggers in each of these mass bins. As expected, the 
high-.M bin has a greater fraction of high-SNR triggers. 

The combined re-weighted SNR is calculated as the 
quadrature sum of the SNRs in the individual detectors. 
However, different detectors can have different rates of 
non-stationary transients as well as different sensitivi- 
ties, so the combined SNR is not necessarily the best 
measure of the significance of a trigger. Additionally, 
background triggers found in three-detector coincidence 
will have a different distribution of combined re-weighted 
SNRs than two-detector coincident triggers [TT]. There- 
fore, we separate coincident triggers by their type, which 
is determined by the coincidence itself (e.g., H1H2, or 
H1H2L1) and by the availability of data from each de- 
tector, known as "coincident time." Thus, the trigger 
types would include H1L1 coincidences in H1L1 double- 
coincident time; H1L1, H1V1, L1V1, and H1L1V1 co- 
incidences in H1L1V1 triple-coincident time; and so on. 
When HI and H2 are both operational, we have fewer 
coincidence types than might be expected as H1H2 trig- 
gers are excluded due to our inability to estimate their 
background distribution, and the effective distance cut 
removes H2L1 or H2V1 coincidences. The product of 
mass bins and trigger types yields all the trigger cate- 
gories. 

For simplicity, we treat times when different networks 
of detectors were operational as entirely separate exper- 
iments; this is straightforward to do, as there is no over- 
lap in time between them. Furthermore, the data from a 
long science run is typically broken down into a number 
of distinct stretches, often based upon varying detector 
sensitivity or glitchiness, and each is handled indepen- 
dently. 

For each category of coincident triggers within an ex- 
periment, an additional clustering stage is applied. If 
there is another coincident trigger with a larger combined 
re- weighted SNR within 10 s of a given trigger's end time, 
the trigger is removed. We then compute the FAR as a 
function of combined re- weighted SNR as the rate (num- 
ber over the total coincident, time-shifted search time) of 
time-shift coincidences observed with higher combined 
re-weighted SNR within each category. These results 
must then be combined to estimate the overall signifi- 
cance of triggers: we calculate a combined FAR across 
categories by ranking all triggers by their FAR, count- 
ing the number of more significant time-shift triggers, 
and dividing by the total time-shift time. The resulting 
combined FAR is essentially the same as the uncombined 
FAR, multiplied by the number of categories that were 
combined. We often quote the inverse FAR (IFAR) as 
the ranking statistic, so that more significant triggers 
correspond to larger values. A loud GW may produce 
triggers in more than one mass bin, and consequently 
more than one candidate trigger might be due to a single 
event. This is resolved by reporting only the coincident 
trigger with the largest IFAR associated with a given 
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FIG. 13. Cumulative histogram of triggers vs. IFAR for all 
time-shift triggers in H1H2L1 triple-coincident time from a 
representative month of S5 data. The black dashed line marks 
the expected cumulative number, while the shaded regions 
mark its 1- and 2-cr variation. The thin grey lines show the 
cumulative number for 20 of the time shifts, providing an 
additional indication of the expected deviation from the mean. 



event. Figure 13 shows the expected mean (the dashed 
line) and variation (the shaded areas) of the cumulative 
number of triggers as a function of IFAR for the anal- 
ysis of three-detector H1H2L1 time in a representative 
month of S5 data. The variations among time shifts (the 
thin lines) match the expected distribution. The dura- 
tion of the time-shift analysis is ~ 10 8 s, but taking into 
account the six categories of triggers (three mass bins 
and two coincidence types), this yields a minimum FAR 
of ~ 1 yr -1 . 

Clearly a FAR of ~ 1 yr -1 is insufficient to confidently 
identify GW events. The challenge of extending back- 
ground estimation to the level where a loud trigger can 
become a detection candidate was met in the S6-VSR2 /3 
search [21 [M] • Remarkably, even for FARs of one in tens 
of thousands of years, no tail of triggers with large com- 
bined re-weighted SNRs was observed. Evidently, the 
cuts, tests, and thresholds discussed in Section [Hi] are 
effective at eliminating any evidence of a non-Gaussian 
background, at least for low chirp masses. 

In calculating the FAR, we treat all trigger categories 
identically, so we implicitly assign the same weight to 
each. However, this is not appropriate when the detec- 
tors have significantly different sensitivities, since a GW 
is more likely to be observed in the most sensitive detec- 
tors. In the search of LIGO S5 and Virgo VSR1 data [T3] . 
this approach was refined by weighting the categories on 
the basis of the search sensitivity for each trigger type. 
However, if there were an accurate astrophysical model of 
CBC merger rates for different binary masses, the weight- 
ing could easily be extended to the mass bins. 
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C. Evaluating search sensitivity 

The sensitivity of a search is measured by adding sim- 
ulated GW signals to the data and verifying their recov- 
ery by the pipeline, which also helps tune the pipeline's 
performance against expected sources. The simulated 
signals can be added as hardware injections pTj [65] , by 
actuating the end mirrors of the interferometers to re- 
produce the response of the interferometer to GWs; or 
as software injections, by modifying the data after it has 
been read into the pipeline. Hardware injections pro- 
vide a better end-to-end test of the analysis, but only 
a limited number can be performed, since the data con- 
taining hardware injections cannot be used to search for 
real GW signals. Consequently, large-scale injection cam- 
paigns are performed in software. 

Software injections are performed into all operational 
detectors coherently (i.e., with relative time delays, 
phases and amplitudes appropriate for the relative lo- 
cation and orientation of the source and the detectors). 
Simulated GW sources are generally placed uniformly 
over the celestial sphere, with uniformly distributed ori- 
entations. The mass and spin parameters are generally 
chosen to uniformly cover the search parameter space, 
since they are not well constrained by astrophysical ob- 
servations, particularly so for binaries containing black 
holes 66 . Although sources are expected to be roughly 
uniform in volume, we do not follow that distribution 
for simulations, but instead attempt to place a greater 
fraction of injections at distances where they would be 
marginally detectable by the pipeline. The techniques 
used to reduce the dimensionality of parameter space, 
such as analytically maximizing the detection statistic, 
cannot be applied to the injections, which must cover 
the entire space. This necessitates large simulation cam- 
paigns. 

The ihope pipeline is run on the data containing simu- 
lated signals using the same configuration as for the rest 
of the search. Injected signals are considered to be found 
if there is a coincident trigger within 1 s of their injection 
time. The loudest coincident trigger within the 1 s win- 
dow is associated with the injection, and it may be louder 
than any trigger in the time-shift analyses (i.e., it may 
have a FAR of zero). Using a 1 s time window to asso- 
ciate triggers and injections and no requirement on mass 
consistency may lead to some of these being found spuri- 
ously, in coincidence with background triggers. However, 
this effect has negligible consequences on the estimated 
search sensitivity near the combined re- weighted SNR of 
the most significant trigger. 



Figure 14 shows the results of a large number of soft- 
ware injections performed in one month of S5 data. For 
each injection, we indicate whether the signal was missed 
(red crosses) or found (circles, and stars for FAR = 0). 
The recovery of simulated signals can be compared with 
the theoretically expected sensitivity of the search, tak- 
ing into account variations over parameter space: the 
expected SNR of a signal is proportional to A^ 5//6 (for 
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FIG. 14. Found and missed injections in one month of S5 
data plotted at their chirp mass M and decisive distance (see 
main text for definition). Red crosses are missed injections; 
colored circles are injections found with non-zero combined 
FAR, which can be read off the colormap on the right; black 
stars are injections found with FAR = (i.e., associated with 
triggers louder than any in the background from 100 time 
shifts). Nearby injections that are missed or found with high 
FARs are followed up to check for problems in the pipeline, 
and to improve data quality. 



low- mass binaries), inversely proportional to effective dis- 
tance (see Sec. IIIC), and a function of the detectors' 
noise PSD. An insightful way to display injections, used 
in Fig. |14[ is to show their chirp mass Ai and decisive 
distance — the second largest effective distance for the de- 



tectors that were operating at the time of the injection 
(in a coincidence search, it is the second most sensitive 
detector that limits the overall sensitivity). Indeed, our 
empirical results are in good agreement with the stated 
sensitivity of the detectors [67] [68] . A small number of 
signals are missed at low distances: these are typically 
found to lie close to loud non-Gaussian glitches in the 
detector data. 



D. Bounding the binary coalescence rate 

The results of a search can be used to estimate (if pos- 
itive detections are reported) or bound the rate of bi- 
nary coalescences. An upper limit on the merger rate is 
calculated by evaluating the sensitivity of the search at 
the loudest observed trigger [3TJ |6"9"H7"T] . Heuristically, 
the 90% rate upper limit corresponds to a few (order 2- 
3) signals occurring over the search time within a small 
enough distance to generate a trigger with IFAR larger 
than the loudest observed trigger. 

More specifically, we assume that CBC events occur 
randomly and independently, and that the event rate is 
proportional to the star-formation rate, which is itself as- 
sumed proportional to blue-light galaxy luminosity |72j . 
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FIG. 15. Search efficiency for BNS injections in a month of 
representative S5 data (blue) and in Gaussian noise (red), for 
a false-alarm rate equal to the FAR of the loudest foreground 
trigger in each analysis. 



For searches sensitive out to tens or hundreds of mega- 
parsecs, it is reasonable to approximate the blue-light 
luminosity as uniform in volume, and quote rates per 
unit volume and time [73] . We follow [TTJ [70] and infer 
the probability density for the merger rate i?, given that 
in an observation time T no other trigger was seen with 
IFAR larger than its loudest-event value, a m : 

p(R\a m , T) cx p(R) e - RV ^ T (1 + A(a m )RTV(a m )) ; 

(19) 

here p(R) is the prior probability density for R, usually 
taken as the result of previous searches or as a uniform 
distribution for the first search of a kind; V(a) is the 
volume of space in which the search could have seen a 
signal with IFAR > a; and the quantity A is the relative 
probability that the loudest trigger was due to a GWs 
rather than noise, 



A 



\V'(a m )\PB(a m ) 
V{a m ) P' B {a m y 



with Pb (a) 



-T/a 



(20) 



with the prime denoting differentiation with respect to 
a. For a chosen confidence level 7 (typically 0.9 = 90%), 
the upper limit R* on the rate is then given by 



p(R\a m ,T) dR. 



(21) 



T) 



It is clear from Eq. (19 1 that the decay of p(R\a 
and the resulting i?* depend critically on the sensitive 
volume V(a m ). In previous sections we have shown how 
ihope is highly effective at filtering out triggers due to 
non-Gaussian noise, thus improving sensitivity, and in 
the context of computing upper limits, we can quantify 
the residual effects of non-Gaussian features on V(a m ). 
In Fig. [15] we show the search efficiency for BNS sig- 
nals, i.e. the fraction of BNS injections found with IFAR 



above a fiducial value, here set to the IFAR of the loud- 
est in-time noise trigger as a function of distance, for one 
month of S5 data and for a month of Gaussian noise with 
the same PSDsj^] Despite the significant non-Gaussianity 
of real data, the distance at which efficiency is 50% is 
reduced by ~ 10% and the sensitive search volume by 
~ 30%, compared to Gaussian-noise expectations. 



V. DISCUSSION AND FUTURE 
DEVELOPMENTS 

In this paper we have given a detailed description of 
the ihope software pipeline, developed to search for GWs 
from CBC events in LIGO and Virgo data, and we have 
provided several examples of its performance on a sam- 
ple stretch of data from the LIGO S5 run. The pipeline 
is based on a matched-filtering engine augmented by a 
substantial number of additional modules that imple- 
ment coincidence, signal-consistency tests, data-quality 
cuts, tunable ranking statistics, background estimation 
by time shifts, and sensitivity evaluation by injections. 
Indeed, with the ihope pipeline we can run analyses that 
go all the way from detector strain data to event signifi- 
cance and upper limits on CBC rates. 

The pipeline was developed over a number of years, 
from the early versions used in LIGO's S2 BNS search 
to its mature incarnation used in the analysis of S6 and 
VSR3 data. One of the major successes of the ihope 
pipeline was the mitigation of spurious triggers from non- 
Gaussian noise transients, to such an extent that the 
overall volume sensitivity is reduced by less than 20% 
compared to what would be possible if noise was Gaus- 
sian. Nevertheless, there are still significant improve- 
ments that can and must be made to CBC searches if 
we are to meet the challenges posed by analyzing the 
data of advanced detectors. In the following paragraphs, 
we briefly discuss some of these improvements and chal- 
lenges. 

Coherent analysis. As discussed above, the ihope 
pipeline comes close to the sensitivity that would be 
achieved if noise was Gaussian, with the same PSD. 
Therefore, while some improvement could be obtained by 
implementing more sophisticated signal-consistency tests 
and data-quality cuts, it will not be significant. If three 
or more detectors are active, sensitivity would be im- 
proved in a coherent [7H [75] (rather than coincident) 
analysis that filters the data from all operating detectors 
simultaneously, requiring consistency between the times 
of arrival and relative amplitudes of GW signals, as ob- 
served in each data stream. Such a search is challeng- 
ing to implement because the data from the detectors 



4 For Gaussian noise, we do not actually run injections through the 
pipeline, but compute the expected SNR, given the sensitivity of 
the detectors at that time, and compare with the largest SNR 
among Gaussian-noise in-time triggers. 
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must be combined differently for each sky position, sig- 
nificantly increasing computational cost. 

Coherent searches have already been run for unmod- 
eled burst-like transients [75], and for CBC signals in 
coincidence with gamma-ray-burst observations |77| , but 
a full all-sky, all-time pipeline like ihope would require 
significantly more computation. A promising compro- 
mise may be a hierarchical search consisting of a first co- 
incidence stage followed by the coherent analysis of can- 
didates, although the estimation of background trigger 
rates would prove challenging as time shifts in a coher- 
ent analysis cannot be performed using only the recorded 
single detector triggers but require the full SNR time se- 
ries. 

Background estimation. The first positive GW detec- 
tion requires that we assign a very low false- alarm prob- 
ability to a candidate trigger [141. In the ihope pipeline, 
this would necessitate a large number of time shifts, thus 
negating the computational savings of splitting matched 
filtering between two stages, or a different method of 
background estimation [64, 78 . Whichever the solution, 
it will need to be automated to identify signal candidates 
rapidly for possible astronomical follow up. 

Event-rate estimation. After the first detections, we 
will begin to quote event-rate estimates rather than up- 
per limits. The loudest-event method can be used for this 
|70j . provided that the data are broken up so that much 
less than one gravitational wave signal is expected in each 
analyzed stretch. There are however other approaches 
|79| that should be considered for implementation. 

Template length. The sensitive band of advanced de- 
tectors will extend to lower frequencies (~ 10 Hz) than 
their first-generation counterparts, greatly increasing the 
length and number of templates required in a matched- 
filtering search. Increasing computational resources may 
not be sufficient, so we are investigating alternative ap- 
proaches to filtering [80-84 and possibly the use of 
graphical processing units (GPUs). 

Latency. The latency of CBC searches (i.e., the "wall- 
clock" time necessary for search results to become avail- 
able) has decreased over the course of successive science 
runs, but further progress is needed to perform prompt 
follow-up observations of GW candidate with conven- 
tional (electromagnetic) telescopes [851186]. The target 



should be posting candidate triggers within minutes to 
hours of data taking, which was in fact achieved in the 
S6-VSR3 analysis with the MBTA pipeline [50]. 

Template accuracy. While the templates currently 
used in ihope are very accurate approximations to BNS 
signals, they could still be improved for the purpose of 
neutron star-black hole (NSBH) and binary black hole 
(BBH) searches [49]. It is straightforward to extend 
ihope to include the effects of spin on the progress of 
inspiral (i.e., its phasing), but it is harder to include the 
orbital precession caused by spins and the resulting wave- 
form modulations. The first extension would already im- 
prove sensitivity to BBH signals [57J[5S], but precessional 
effects are expected to be more significant for NSBH sys- 
tems [29"]l89]. 

Parameter estimation. Last, while ihope effectively 
searches the entire template parameter space to identify 
candidate triggers, at the end of the pipeline the only in- 
formation available about these are the estimated binary 
masses, arrival time, and effective distance. Dedicated 
follow-up analyses can provide much more detailed and 
reliable estimates of all parameters {9~0H93|. but ihope 
itself could be modified to provide rough first-cut esti- 
mates. 
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